Quiz: State-Value Functions

In this quiz, you will calculate the value function corresponding to a particular policy.

Each of the nine states in the MDP is labeled as one of \mathcal{S}^+ = {s_1, s_2, \ldots, s_9 } , where s_9 is a terminal state.

Consider the (deterministic) policy that is indicated (in orange) in the figure below.

The policy \pi is given by:

\pi(s_1) = \text{right}

\pi(s_2) = \text{right}

\pi(s_3) = \text{down}

\pi(s_4) = \text{up}

\pi(s_5) = \text{right}

\pi(s_6) = \text{down}

\pi(s_7) = \text{right}

\pi(s_8) = \text{right}

Recall that since s_9 is a terminal state, the episode ends immediately if the agent begins in this state. So, the agent will not have to choose an action (so, we won't include s_9 in the domain of the policy), and v_\pi(s_9) = 0.

Take the time now to calculate the state-value function v_\pi that corresponds to the policy. (You may find that the Bellman expectation equation saves you a lot of work!)

Assume \gamma = 1.

Once you have finished, use v_\pi to answer the questions below.

Question 1

What is v_\pi(s_4)?

SOLUTION:

1

Question 2

What is v_\pi(s_1)?

SOLUTION:

2

Question 3

Consider the following statements:

(1) v_\pi(s_6) = -1 + v_\pi(s_5)
(2) v_\pi(s_7) = -3 + v_\pi(s_8)
(3) v_\pi(s_1) = -1 + v_\pi(s_2)
(4) v_\pi(s_4) = -3 + v_\pi(s_7)
(5) v_\pi(s_8) = -3 + v_\pi(s_5)

SOLUTION:

(2)
(3)